title: “Demystifying the Carnegie Classifications”
author: “Paul Harmon”
date: “April 17, 2017”
output:
beamer_presentation:
toc: true
theme: “AnnArbor”
colortheme: “crane”
fonttheme: “structurebold”

Introduction: What are the Carnegie Classifications?

Montana State (R-2), Stanford (R-1), Boise State(R-3):

Montana State (R-2), Stanford (R-1), Boise State(R-3):

Montana State University: A History

2015 Carnegie Classifications

2015 Carnegie Classifications

Montana State’s Nearest Neighbors

Nearest Institutions to Montana State

Nearest Institutions to Montana State

Principal Components Analysis

Goal: To reduce p predictors into k components via eigenvalue decomposition. PCA can be done on unscaled raw data or on a scaled covariance matrix.

Given p predictors \(x_1, x_2,...x_p\), we can generate via an eigenvalue decomposition of X a set of p new variables \(y_1,y_2,...,y_n\). The \(y\)’s are ordered so that \(y_1\) explains the most variation in the underlying \(x\)’s, and \(y_p\) the least.

Scores: The new set of covariates. These are functions (weighted averages) of the old covariates. Loadings: The loadings give the formula used to calculate the scores from the original covariates.

But how do we do dimension reduction? Since we know how much variation in \(x\) is explained by each \(y\)-score, we can make a new factor matrix of some subset of the scores. The Carnegie Classifications use only the first score from each PCA run.

How are the Classifications Calculated?

The classifications are calculated based on two indices of institutional output. The first is based on a weighted average of the number of PhDs awarded by the institution; the second is based on a per-capita measurement of research expenditures and research staff. Aggregate Index: \[Ag.Index_{i} = HumanitiesPhD_{i} + StemPhD_{i} + SocialSciencePhD_{i} + OtherPhD_{i} + StemExpenditures_{i} + NonStemExpenditures_{i} + ResearchStaff_{i} \] Per Capita Index: \[PC.Index_{i} = \frac{ResearchStaff_{i} + StemExpenditures_{i} + NonStemExpenditures_{i}}{FacultySize_{i}} \]

Replicating the Classifications

The scores were calcluated using the following methods:
  • Rank each instution on the covariates
  • Calculate Principal Component Scores for each index
  • Plot the first PC score from the aggregate index vs the first PC score from the per-capita index

Methods for Ties

Because the scores are based on ranks, not the original data. Since most of the variables used are counts, many schools will be tied, especially for the aggregate index. R calculates ties in several ways:
  • Average (default): If the first 3 schools are tied, each would get rank 1.5.
  • Minimum: If the first 3 schools are tied, each would get rank 1.
  • Maximum: If the first 3 schools are tied, each would get rank 3.
  • First: If the first 3 schools are tied, permute 1 to 3 to get rank.
  • Last: If the first 3 schools are tied, permute 3 to 1 to get rank.

Methods for Ties:

Where were lines drawn?

Unstandardized Scores

How do we change?

Single Metric Changes: Aggregate Index

Single Metric Changes:

Next Steps:

The Carnegie Classifications were based on a number of seemingly arbitrary choices. Why use the minimum rank instead of the average? Why un-scale prior to drawing the lines between groups? Why draw lines where they drew them? Rather than clustering on the first PC score for two indices, why not include the first two PC scores? +

Slide with Plot

Text goes here I think

References